sync: merge upstream/main (286 commits) — 2026-06-23#487
Merged
Conversation
… an empty summary (#50297) When an OpenAI-compatible proxy (e.g. cmkey.cn, one-api Anthropic channels) returns a well-formed HTTP 200 whose summary content is null or empty/ whitespace-only, _generate_summary coerced it to "" and stored a prefix-only summary — silently replacing the compacted turns with nothing. The model then lost all in-progress context after compression (#11978, #11914). _validate_llm_response already guards None / empty-choices, so those never reach the compressor; the gap was a well-formed response with empty *content*. Now treat empty content as a summary failure: raise so it routes through the existing main-model fallback then transient cooldown, dropping the turns without a summary rather than wiping context with an empty one. Also narrow the bare 'except RuntimeError' so only genuine 'No LLM provider configured' errors take the 600s no-provider cooldown; empty/invalid-response RuntimeErrors from a configured provider now correctly get the main-model fallback instead of being misrouted into the long no-provider cooldown. Reported by @Hung2124; area identified by @annguyenNous in #39590.
…#36908) The 'Session compressed N times — accuracy may degrade' warning went through _vprint (CLI stdout only), so the Ink TUI / Telegram / Discord never saw it — unlike the two other compression warnings in the same module, which route through _emit_status (and store _compression_warning for late-bound gateway status_callback replay). Set agent._compression_warning + call agent._emit_status() for this warning too, matching the sibling pattern. _emit_status still _vprints for the CLI, so CLI output is unchanged; TUI / gateway surfaces now receive it via status_callback (and replay_compression_warning can re-deliver it once a late-bound gateway callback is wired). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
Funnel session finalization through AIAgent.close() — the single terminal path every agent (CLI, gateway, subagent, cron) funnels through — so finished agents stop leaving rows with ended_at IS NULL. The biggest leak source was delegate_task subagent + background-review forks whose close() never ended their row. end_session() is first-reason-wins and no-ops on an already-ended row, so a 'compression'/'cron_complete'/'cli_close' reason set by an earlier terminal path is never clobbered. /resume already calls reopen_session(), so finalizing-on-close does not break resumability. Temporary helper agents that rotate/share the session forward (manual compression, gateway session-hygiene) opt out via _end_session_on_close=False. Also stop the long-running gateway heartbeat once the executor is done or the session slot is rebound to a different agent, preventing a stale 'running: delegate_task' bubble from outliving its run. Closes #12029.
The background-review fork (fires ~every 10 turns) pins review_agent.session_id = agent.session_id — the parent's LIVE id — for prefix-cache parity, then calls close(). With session finalization now in close(), that would end the still-active parent session mid-conversation. Set _end_session_on_close = False on the fork so the real owner (CLI close / gateway reset / cron) finalizes the session instead. Follow-up to the #12029 fix.
The TUI /compress slash side-effect compressed the session, synced the key, and emitted session.info — but returned an empty string, so the user saw no 'Compressed: N → M messages / ~X → ~Y tokens' feedback. The CLI (_manual_compress) and gateway (slash_commands) paths both already call summarize_manual_compression; the TUI slash path was the lone gap. Snapshot history + rough token estimate before and after compaction and return the formatted summarize_manual_compression() feedback, mirroring the session.compress RPC handler. The estimate uses the same estimate_request_tokens_rough(system_prompt, tools) inputs as the RPC path, re-reading the system prompt after compaction (it may be rebuilt). Co-authored-by: liuhao1024 <sunsky.lau@gmail.com>
PostgreSQL's initdb refuses to run as root, so the embedded Hindsight daemon could never initialize its data directory under root. The daemon-start thread would fail, retry, and loop forever — each cycle reloading embedding models (~958MB RAM, ~33% CPU) with no user-visible error, leaving Hermes sluggish on a common VPS/cloud root setup. initialize() now detects root (os.geteuid() == 0) before spawning the daemon thread, disables local_embedded mode, and surfaces a clear warning to both the log and the terminal so the user knows to run as a non-root user or switch to cloud / local_external mode. Closes #13125. Co-authored-by: teknium1 <127238744+teknium1@users.noreply.github.com>
Bedrock Claude routes through the AnthropicBedrock SDK and injects cache_control, so cached tokens are always reported — but the pricing table had no cache cost fields for any Bedrock model, so /usage showed "cost unknown" on every cached session. Also, cross-region inference profiles (us./global./eu. prefixes) never matched the bare pricing keys. - Add cache_read/cache_write rates to the four Bedrock Claude rows (read 0.1x input, write 1.25x input per the Bedrock pricing page). - Normalize the cross-region prefix in the Bedrock pricing lookup, mirroring is_anthropic_bedrock_model's prefix list. Closes #50295.
hermes config show printed the model dict raw via print(), bypassing the logging redactor; a custom-provider api_key (e.g. Cloudflare cfut_...) was shown in plaintext even with security.redact_secrets=true. Opaque tokens don't match any vendor-prefix regex, so structural key-name masking is required. - Add redact_config_value(): recursively masks credential-shaped keys (api_key/token/secret/... exact-match) via mask_secret. - Wrap the show_config model dump in it. - Mask the set_config_value echo when the leaf key is credential-shaped (config set model.api_key routes to config.yaml, lowercase misses the .env allowlist).
The salvaged #19820 unifies the write_file guard under _is_internal_file_tool_content with the message 'internal read_file display text'. Two tests added to test_file_read_guards.py after the PR branch point still asserted the old 'status text' wording. Update them to match the new (correct, more general) message.
…stion Inbound image/audio/video payloads were buffered fully into process memory before being written to the cache, with no size limit. A large upload (Discord Nitro allows 500 MB) or a remote media URL in an inbound message pointing at a huge file could spike RAM and OOM-kill the gateway. Enforce a configurable cap in the shared cache helpers (gateway/platforms/ base.py) so the protection holds across every platform adapter, not one: - cache_image/audio/video_from_bytes reject oversized payloads before writing (video was the gap in the original report — now covered). - cache_image/audio_from_url stream the body, rejecting on an oversized Content-Length header and re-checking the running total per chunk so an absent/lying header can't smuggle an unbounded body past the cap. - Discord's _read_attachment_bytes checks att.size up front, so an oversized attachment is rejected before any bytes are pulled into memory. Configurable via gateway.max_inbound_media_bytes in config.yaml (default 128 MiB; 0 disables). No new env var — non-secret config lives in config.yaml. Salvaged and extended from @sgaofen's PR #13341 (the original report and the shared-helper approach). Reapplied onto current main (Discord adapter has since moved to plugins/platforms/discord/), the configurable knob moved from an env var to config.yaml, and the video cache helper added. Co-authored-by: Hermes Agent <noreply@nousresearch.com>
…timeout (#50312) A turn forcibly interrupted by the drain-timeout escalation never reaches turn_finalizer.finalize_turn (the only place that flushes the turn to state.db). Its in-flight tool rounds live only in the in-memory _session_messages, so the immediate pre-restart turn was silently dropped from load_transcript() on resume. _finalize_shutdown_agents now flushes _session_messages to the SQLite session store before teardown. The flush is idempotent (identity-tracked in _flush_messages_to_session_db), so agents that finished gracefully re-flush nothing. The resume_pending / fresh-tool-tail branches in _handle_message_with_agent already expect a transcript whose tail may be a pending tool result. Fixes #13121.
Rich messages are not ready for primetime: current Telegram clients can render Bot API 10.1 rich messages as blank/unsupported bubbles and make them hard to copy as plain text, which is worse than the legacy MarkdownV2 path for command snippets and mobile handoffs. Default the rich_messages toggle to False so replies stay on the copyable legacy path; users opt in per bot via platforms.telegram.extra.rich_messages: true. Updates adapter, gateway config default, example config, English + zh-Hans docs, and the default/opt-in tests.
… (#50325) hermes backup only walks HERMES_HOME, so memory providers that keep config/credentials in home-anchored dotdirs (honcho -> ~/.honcho, hindsight -> ~/.hindsight, openviking -> ~/.openviking) lost that data across a backup/import cycle — the peer IDs, session pairings, and API keys never made it into the archive. Add an optional MemoryProvider.backup_paths() hook (default []). The active provider declares its external paths; backup resolves them from config only (no init, no network), archives the ones under the home dir into a reserved _external/ subtree encoded relative to home, and import restores them to their original location with a home-anchored traversal guard and 0600 on credential-shaped files. Paths outside home are skipped as non-portable. honcho, hindsight, and openviking override the hook. E2E-validated full backup->import cycle plus 7 new tests.
… DB corruption (#50331) A shell-launched 'hermes gateway run --replace' / 'gateway restart' on a systemd/launchd host can leave an orphan gateway whose kanban dispatcher escapes the service cgroup, survives 'systemctl restart', and becomes a second long-lived writer on the shared kanban.db. Two dispatchers that each believe they own the file both pass SQLite busy_timeout and then race on WAL frames — the documented root cause of multi-writer corruption (issue #35240). The existing _guard_supervised_gateway_conflict startup guard blocks the common way an orphan is born, but does nothing once a second dispatcher already exists. This adds the defense-in-depth: dispatch_once now wraps every tick in a non-blocking, board-scoped flock (_dispatch_tick_lock). A losing dispatcher returns DispatchResult(skipped_locked=True) and does zero DB writes this tick — so two dispatchers can never run a reclaim/spawn/write sequence concurrently regardless of how the second one got there. - Non-blocking (LOCK_NB): never stalls the gateway's async watcher. - Board-scoped: lock file is a .dispatch.lock sibling of each board's kanban.db, so unrelated boards tick in parallel. - POSIX + Windows (fcntl / msvcrt LK_NBLCK), no-op degrade where neither exists — mirrors the existing _cross_process_init_lock pattern. Verified with a real two-process orphan repro: while a separate process holds the lock, dispatch_once skips; after release it runs.
_collect_delegate_child_ids() walks the _delegate_from marker chain to gather delegate subagents for cascade deletion, but started its visited set empty. When the chain loops back onto a parent — a delegation cycle, or a parent that is also another parent's delegate child when several ids are deleted together — that parent was collected as one of its own descendants and then permanently deleted, along with all of its messages, by _delete_delegate_children(). Seed the visited set with the parent ids so they can never be re-collected, and exclude them from the returned child set. Callers (delete_session, bulk delete) remove the parents separately, so this only prevents the unintended parent deletion; legitimate child collection is unchanged. Add regression tests (in-memory sqlite) covering single/multi-level delegate chains, the parent_session_id+marker branch, untagged children (orphan-don't-delete contract), and the cycle case that previously leaked the parent into the deletion set. Fixes #49148
…HTTP path (#50319) * fix(api-server): stop silently promising async delivery on stateless HTTP path terminal(notify_on_complete=True / watch_patterns) and delegate_task(background=True) silently no-op'd on the API server / WebUI path (#10760): the watcher / detached child registered, but every API-server route (OpenAI-spec /v1/chat/completions and /v1/responses, plus the proprietary /v1/runs SSE stream) tears down its channel when the turn ends, and APIServerAdapter.send() is a no-op stub. A completion that fires after the response closed had nowhere to go — from the agent side, indistinguishable from a hang. There is no spec-compliant surface to wake the agent later on a stateless HTTP client, so make the no-op honest instead of silent: - Add a per-adapter capability flag supports_async_delivery (default True; APIServerAdapter = False), propagated into a HERMES_SESSION_ASYNC_DELIVERY contextvar via async_delivery_supported(). Toggle on the adapter, not a hardcoded platform string — a future stateless adapter is correct-by-default. - terminal: when delivery is unsupported, skip watcher registration, force notify_on_complete off, and return a notify_unsupported note telling the agent to process(action='poll'). - delegate_task: when delivery is unsupported, fall back to SYNCHRONOUS execution (work runs and returns in the same response) with a note, instead of handing out a handle that never resolves. CLI (in-process completion_queue) and the real gateway platforms are unchanged. Fixes #10760 * refactor(api-server): route session binding through a single no-delivery chokepoint Add APIServerAdapter._bind_api_server_session() and route both agent-entry paths (_run_agent for /v1/chat/completions + /v1/responses, and the /v1/runs _run_sync path) through it. The helper hardwires platform="api_server" and async_delivery=False with no async_delivery parameter to pass, so a future route added to the API server physically cannot reintroduce the silent no-op (#10760) by forgetting to mark the channel as non-delivering. The binding stays request-scoped (cleared per turn), so a session resumed later on a delivering interface (CLI / gateway platform) re-binds fresh and is NOT blocked — the no-delivery decision tracks the interface handling the current turn, never the session.
…cooldown
Closes #50185
Two independent gaps let a transient Photon/Spectrum upstream overflow
degrade message delivery and amplify gRPC pressure:
1. _is_retryable_error did not recognise Photon- or Envoy-specific error
strings ("internal sidecar error", "upstream connect error",
"reset reason: overflow"), so _send_with_retry fell through to the
plain-text fallback immediately instead of backing off and retrying.
2. send_typing had no rate gate, so a burst of typing-indicator calls
during an overflow event kept hitting the upstream gRPC connection and
widened the failure window.
Fix:
- Add _PHOTON_RETRYABLE_PATTERNS with the three high-specificity Envoy /
sidecar substrings and override _is_retryable_error on PhotonAdapter to
check them after delegating to the base-class patterns. base.py and all
other adapters are untouched.
- Add a 5 s per-chat cooldown in send_typing backed by _typing_last_sent.
stop_typing clears the entry so the next start after a completed turn
fires immediately — only rapid consecutive starts without a stop are
suppressed.
- Reduce PhotonAdapter._send_with_retry default max_retries from 2 to 1
(single 2 s back-off check) — enough to confirm whether the Envoy
circuit-breaker has opened, without adding unnecessary latency.
All changes are scoped to plugins/platforms/photon/adapter.py.
When the Node spectrum-ts sidecar process exited mid-session (crash,
OOM, upstream overflow escalation), _supervise_sidecar returned
silently — readline hit EOF, the log-pump loop broke, and nothing
notified the gateway. _inbound_loop entered an infinite retry loop
against a dead port, _running stayed True, and the adapter remained
in self.adapters with no path to self-recovery short of a manual
gateway restart.
Add a death-detection tail to _supervise_sidecar: after the log-pump
exits (EOF or exception), guard on _inbound_running to distinguish
unexpected death from a deliberate disconnect(). On unexpected exit,
call _set_fatal_error("SIDECAR_CRASHED", retryable=True) followed by
_notify_fatal_error() so the reconnect watcher picks up the platform
within 30 s and retries with exponential backoff (30 s → 300 s cap)
until the sidecar comes back up. All other platforms remain unaffected.
The _inbound_running guard is safe against races: disconnect() sets
_inbound_running = False before _stop_sidecar() cancels the supervisor
task. CancelledError is BaseException, not Exception, so it bypasses
the except clause and propagates normally — the detection block never
runs during a clean shutdown.
…tection Follow-up for salvaged PR #50256. Unit tests for the three behaviors: retryable classification of Envoy/sidecar overflow strings, per-chat typing cooldown with stop_typing reset, and the _supervise_sidecar crash-detection path that raises a retryable fatal (and the clean-shutdown no-op).
The read_file device guard now walks symlink hops before the file operation layer, but that hop walk still interpreted relative paths against the Python process cwd. In sessions where TERMINAL_CWD points at the task workspace, a relative workspace symlink to a blocked alias such as /dev/../dev/stdin could therefore miss the intermediate device target before later task-cwd resolution. Anchor relative device checks to the task base before symlink-hop inspection so the pre-I/O guard sees the same workspace path that read_file would otherwise read. Absolute device paths and the existing final realpath fallback remain unchanged. Refs #10141 Refs #29158
…50341) On resource-contended hosts the embedded Hindsight daemon can exceed a single 2s /health check; upstream then waits a grace window before treating it as stale and killing+restarting it (hindsight-embed reads HINDSIGHT_EMBED_PORT_HEALTH_GRACE_TIMEOUT, default 30s, into a module-level constant at import time). Users on busy boxes had no Hermes-side way to raise it short of hand-setting an env var. Add a 'port_health_grace_timeout' config.json option to the Hindsight plugin. When set, initialize() exports it to the process env BEFORE daemon_embed_manager is imported (the import-time read is the contract). setdefault() so an explicit operator env override always wins. Exposed in 'hermes memory setup' for local_embedded mode. Follow-up to #50308 / issue #13125 comment thread.
…imeout Fixes a regression introduced by the prior approach (synchronous import hermes_cli.gateway inside _lifespan) that caused a new failure mode: the blocking import stalled the asyncio event loop before uvicorn could bind its port, pushing HERMES_DASHBOARD_READY past the desktop shell's 45 s announcement deadline and triggering a respawn loop that accumulated orphaned backend processes. Two-part fix: _lifespan: replace the blocking import with a fire-and-forget run_in_executor call (_warm_gateway_module). The import runs in a worker thread while the server socket is already open, so HERMES_DASHBOARD_READY fires without delay. get_status: replace the inline lazy import with await run_in_executor(None, _resolve_restart_drain_timeout). This is the root fix for the original 15 s socket-timeout: the blocking .pyc-compilation + Defender scan is offloaded to a thread, keeping the event loop free for every /api/status probe. After the first call the module is in sys.modules and the executor returns in microseconds. Both helpers are extracted as module-level sync functions so they can be unit-tested independently of FastAPI or uvicorn. Closes #50209 Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Three tests covering the scenarios from issue #50209 that could not be validated with real Defender on a fresh install: 1. test_lifespan_warmup_is_nonblocking Patches _warm_gateway_module to sleep 3 s. Measures TestClient startup time — must complete in < 1.5 s, proving the fire-and-forget run_in_executor does not block the event loop before port binding (HERMES_DASHBOARD_READY timing proxy). 2. test_get_status_does_not_block_event_loop Patches _resolve_restart_drain_timeout to sleep 3 s. Fires concurrent GET /api/status and GET /api/version requests. /api/version must respond in < 3 s while /api/status waits — proving the event loop stays free during the slow import (15 s socket timeout would not fire). 3. test_concurrent_status_probes_all_respond Three simultaneous /api/status probes with the slow patch — all must return HTTP 200 (no connection resets, no orphan accumulation). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The port-announcement clock in waitForDashboardPort starts the instant the backend process is spawned — before uvicorn binds its socket. On a cold install the child first compiles and imports the whole hermes_cli.main -> web_server -> FastAPI/uvicorn chain, and on Windows real-time AV scans every freshly written .pyc. That pre-bind cost can exceed the old hardcoded 45s deadline, so the desktop killed a healthy-but-still-starting backend and respawned it, piling up orphaned processes (#50209). Raise the default to 90s and make it overridable via HERMES_DESKTOP_PORT_ANNOUNCE_TIMEOUT_MS, clamped to a 45s floor so a bad override can't reintroduce the loop. Warm starts still announce in well under a second; both call sites inherit the new default with no change. Adds backend-ready.test.cjs (wired into test:desktop:platforms).
…) (#50342) Add a platform-neutral send-failure vocabulary so consumers can branch on a typed category instead of substring-matching the raw provider message. - base.py: SEND_ERROR_KINDS + classify_send_error() (too_long / bad_format / forbidden / not_found / rate_limited / transient / unknown), and an optional SendResult.error_kind field (defaults None — fully backward compatible). - telegram.py: populate error_kind on send() failures; message_too_long keeps its existing error token plus error_kind='too_long'. Purely additive: no behavioral change to the existing degrade-and-deliver paths (MarkdownV2->plain-text fallback, overflow split, retry classification all untouched). 22 new tests + 210 adapter regression tests green.
After the agent's final response, the '...typing' bubble persisted ~5s. send() re-triggers send_typing() after every delivery so the bubble survives intermediate progress messages (Telegram clears typing on each delivered message). But that re-trigger also fired on the FINAL send, re-arming Telegram's ~5s timer AFTER the gateway had already torn down its typing-refresh loop — and Telegram exposes no stop-typing API, so nothing cancelled it. Gate the post-send re-trigger on the absence of metadata['notify'] (set only on the final user-visible reply via _mark_notify_metadata). Both the rich-message and legacy send paths are covered; intermediate progress sends still re-trigger so the bubble stays alive mid-response. Fixes #48678
Drop the inline-code border; halve the expanded tool block radius.
…leanup fix(desktop): manual tool previews via status stack
… project.facts RPC (#51259) Follow-up to the coding-context posture (#43316): that PR detects each repo's verify loop (manifests, package manager, exact test/lint/build commands, context files) and bakes it into the system-prompt snapshot — but only as a string, for the model. Non-prompt consumers (the desktop verify UI) had no way to read it without re-sniffing and drifting from the prompt. Split detection from rendering, keeping one source of truth: - `detect_project_facts(root) -> ProjectFacts` (frozen) holds the structured facts; `_project_facts()` now renders it into the same snapshot lines, so the prompt block stays byte-identical (cache-safe). - `project_facts_for(cwd)` resolves the workspace root (git, else marker) and returns the structured facts, or None outside a workspace. - `project.facts` gateway RPC surfaces it to any client (desktop/TUI/ACP). Tests assert the structured output and that the UI-facing commands never drift from what the prompt block renders (one detector feeds both).
A "one-shot" is a single stateless model call that runs OUTSIDE any conversation: it never touches session history, never breaks prompt caching, and returns plain text. UI surfaces need this for small generative chores — a commit message from a diff, a rename suggestion, a summary — where an agent turn would pollute the thread and hand-rolling an LLM call at every call site would be worse. - `agent/oneshot.py`: `run_oneshot(...)` over the existing auxiliary-client plumbing (same path as title generation). Two call shapes: explicit instructions/input, or a registered `template` + `variables` (templates own the prompt engineering so it stays consistent across CLI/TUI/desktop). Ships a `commit_message` template. Model selection inherits the live session via `main_runtime`, else the configured aux `task` backend. - `tui_gateway/server.py`: `llm.oneshot` RPC (long-handler) inheriting the session's model when `session_id` resolves. Stateless by construction — no session mutation, cache untouched.
… management plane (#51248)
The gateway half of Phase 6 Unit ζ: project the agent's existing relevance
knobs into the connector's platform-agnostic vocabulary and declare them at boot
over the /relay/policy route, so the SAME mention-gating / free-response /
allow-bots behavior the agent applies directly also governs relay delivery (and
excluded chatter never wakes a scaled-to-zero agent).
- gateway/relay/__init__.py:
- relay_relevance_policy(): project require_mention -> requireAddress,
free_response_channels -> freeResponseScopes, {PLATFORM}_ALLOW_BOTS in
{mentions,all} -> allowOtherBots. Reads the fronted platform's config block
+ bridged top-level keys. Returns None when all-default (the connector's
quiet default already matches) or no concrete platform is fronted.
- send_relay_policy(): POST /relay/policy authenticated with the gateway's own
per-gateway upgrade token (make_upgrade_token — same bearer as the WS
upgrade), so the connector attaches it to the authenticated instance, never
a body-asserted id. Re-declares every boot (self-healing, full replace).
NEVER raises, NEVER blocks boot — relevance is an optimization layered on
the δ/ε authorization gate. Reuses the per-gateway secret + the
/relay/provision host; no new inbound surface, no new credential.
- _policy_url(): ws(s)://…/relay -> http(s)://…/relay/policy.
- gateway/run.py: call send_relay_policy() after register_relay_adapter()
succeeds (the secret is resolved by then).
- docs/relay-connector-contract.md: new §7 documenting per-instance delivery +
the management plane (/manage/* + /relay/policy) + the relevance-declaration
contract; versioning renumbered to §8. Contract conformance test stays green
(§2/§3 tables untouched).
Tests: +12 (projection mapping incl. comma-string + top-level fallback; send
auth/skip/fail-soft/non-200). Full relay suite 118 pass. The connector route is
already E2E-proven (connector repo gateway_policy_driver.py); this adds the real
gateway send-path it pairs with.
This completes Phase 6 (Team Gateway per-user isolation) end to end.
…ailing
Slack in-app voice clips ("record a clip") arrive as MP4/AAC containers
(mimetype audio/mp4, filename audio_message*.mp4), and Slack sometimes
labels them video/mp4. The inbound audio handler derived the cache
extension from the mimetype and fell back to ".ogg" for anything not in
{.ogg,.mp3,.wav,.webm,.m4a} — so audio/mp4 voice messages were cached as
.ogg. OpenAI STT (whisper-1, gpt-4o-transcribe) sniffs the container from
the FILENAME extension, so it received MP4 bytes named .ogg and rejected
them. WhatsApp .ogg and uploaded .m4a worked only because their extension
happened to match the bytes.
Fix:
- _resolve_slack_audio_ext(): pick the cache extension from the real
filename first, then a mimetype map (audio/mp4 -> .m4a), defaulting to
.m4a — never the bogus .ogg fallback. Mirrors the video branch and the
audio map already in gateway/platforms/bluebubbles.py.
- _is_slack_voice_clip(): detect audio-only clips mislabeled video/mp4
via the slack_audio subtype / audio_message* filename, and route them
through the audio path (cached as audio, reported as audio/*) so they
reach STT instead of video understanding. Genuine videos (and
slack_video screen recordings) are left on the video path.
Verified end-to-end against a real audio-only MP4: old path cached it as
.ogg (ffprobe shows MP4 bytes -> container mismatch -> OpenAI rejects);
new path caches it as .mp4 (extension matches bytes -> accepted).
Adds inbound-audio tests (previously none): helper unit tests plus
_handle_slack_message E2E coverage for audio/mp4, video/mp4-mislabeled
voice clips, and a real video staying on the video path. Confirmed the
two voice-message tests fail without the fix (mutation check).
Follow-up to the salvaged voice-clip fix: the rerouted video/mp4 branch
used {".m4a": "audio/mp4"}.get(ext, "audio/mp4"), whose sole key's value
equals the default, so it always returned "audio/mp4" regardless of the
cached extension (dead lookup + a throwaway dict per inbound voice clip).
Replace it with a module-level _SLACK_EXT_TO_AUDIO_MIME map so the reported
media_type matches the bytes we cached (e.g. a clip cached as .wav now
reports audio/wav instead of audio/mp4). STT routing already keys on the
audio/ prefix + cached filename extension, so behavior is unchanged; this
just removes the dead construct and keeps the reported mimetype coherent.
…(#51121) A Medium-integrity Hermes agent cannot drive High-integrity (admin) windows on Windows — UIPI blocks UIA enumeration and mouse injection (SOM returns 0 elements, clicks silently no-op, screenshots still work, keyboard partially bypasses). OS constraint affecting every Windows automation stack, not a cua-driver bug. Document the symptom + the run-elevated workaround. Closes #49067.
Heavy PR checks run on every PR because the workflows deliberately avoid `on.paths` filters — a path-gated workflow leaves its required check pending forever when no matching file changes, blocking merge. So a docs-only PR still spins up the TypeScript matrix, the full Python suite, and ruff/ty. Keep every workflow triggering on every PR (checks always report) but gate the expensive *steps* on what the PR touches. Skipping a step (not the job) leaves the job green, so required checks never hang — the same idiom already proven in contributor-check.yml. A classifier (scripts/ci/classify_changes.py) maps the PR diff to three lanes — python, frontend, site — surfaced as step outputs by a composite action (.github/actions/detect-changes). Fail-open: an empty diff or any .github/ change runs everything; python is a denylist (skipped only when every file is provably prose or a frontend-only package); skills/**/SKILL.md counts as python-relevant since the skill-doc tests read that tree. Non-PR events always run the full pipeline.
The image build + smoke test + integration suite are the heaviest jobs in CI (~9-11 min) and ran on every PR. Gate them to push-to-main and release: a broken build surfaces on the main push, while the cheap pre-merge guards (docker-lint hadolint/shellcheck, uv-lockfile-check) still run on PRs to catch the common Dockerfile/lockfile breakage. Steps skip on PRs so the job stays green; the dead PR-only arm64 cache-warm build is removed.
`npm ci` / `uv sync` / toolchain header fetches occasionally die on transient network blips — e.g. node-pty's node-gyp fetching Node headers (an undici assert) during the typecheck job's `npm ci`, which killed the job before `tsc` ever ran. "Re-run and it goes green" is exactly what CI should do itself. - New reusable `.github/actions/retry` composite action wraps a command and retries on failure (3x / 10s, command passed via env so it can't inject). Applied to every PR-path network install: npm ci (typecheck, desktop build, docs site), uv sync (tests, e2e), uv tool install (lint), pip install (docs site). - typecheck now runs `npm ci --ignore-scripts`: `tsc` needs only sources + type defs, so skipping install scripts drops node-pty's native rebuild (whose header fetch was the flake) and is faster. Validated locally — tsc passes for ui-tui, apps/shared, and apps/desktop with scripts skipped. - ripgrep download uses `curl --retry`. Docker (main-only) and the release/windows workflows are intentionally left for a follow-up.
ci: centralize path-gating behind single orchestrator + all-checks-pass gate Replace the scattered per-workflow detect-changes pattern with a single ci.yml orchestrator that runs the classifier once, then conditionally calls sub-workflows via workflow_call based on lane outputs. A final all-checks-pass job (if: always()) aggregates all results so branch protection only needs to require one check. Changes: - New .github/workflows/ci.yml orchestrator (detect + conditional calls + all-checks-pass gate) - Extend classify_changes.py with scan/deps/mcp_catalog lanes, absorbing supply-chain-audit's internal changes job - Update detect-changes/action.yml to expose the new lane outputs - Convert all 10 PR-gated sub-workflows to workflow_call-only triggers, removing their push/pull_request triggers and per-step detect-changes guards (gating now happens at the orchestrator level) - lint.yml + supply-chain-audit.yml receive event_name as a workflow_call input to replace github.event_name (which is "workflow_call" inside called workflows) - supply-chain-audit.yml: remove internal changes job + *-gate jobs (orchestrator handles gating, booleans arrive as inputs) - contributor-check.yml: remove internal filter step - Update test_classify_changes.py for 6-lane output + new supply-chain test cases
…verride
The installer scanned PATH/well-known locations for a Chrome/Chromium binary
and, when found, skipped the bundled Playwright Chromium download and wrote that
path into ~/.hermes/.env as AGENT_BROWSER_EXECUTABLE_PATH. On Snap-based systems
`command -v chromium` resolves to /snap/bin/chromium, whose sandbox blocks
agent-browser's control socket under /tmp -- so every browser_navigate hung
until the 60s timeout fired ("opening web page failed").
Drop the system-browser fallback entirely (per maintainer direction):
find_system_browser()/Find-SystemBrowser now honor ONLY an explicit, user-set
AGENT_BROWSER_EXECUTABLE_PATH override -- no PATH scan, no well-known-path scan.
A /snap/* path is rejected even when set explicitly, since its confinement is
the bug. Applied to both install.sh (Linux/macOS) and install.ps1 (Windows).
Crucially, also auto-repair already-affected installs: the bad snap path
persists in .env and is read directly by the runtime, and the installer skips
re-config when AGENT_BROWSER_EXECUTABLE_PATH is already set ("already
configured"), so a plain reinstall/update never recovered an existing user. New
strip_snap_browser_override() removes a snap-pointing AGENT_BROWSER_EXECUTABLE_PATH
(and its auto-written comment) from .env on every install/update, run from both
browser-setup paths (install_node_deps and ensure_browser), so updating is
enough to recover. A deliberately-set non-snap override is left untouched.
docker/stage2-hook.sh is intentionally untouched: it discovers the bundled
Playwright Chromium, not a system browser.
…epair Replace the old "skips download when a system browser exists" assertions with tests for the new behavior: - no PATH scan for browser command names, and the "use the system browser" path is gone; - find_system_browser consults only an explicit AGENT_BROWSER_EXECUTABLE_PATH override (which still skips the bundled download); - strip_snap_browser_override runs on both install paths and a /snap/* path is rejected, so already-affected installs auto-recover on update.
…tyle (#51168) Adds a per-platform display.reasoning_style setting (code | blockquote | subtext) controlling how the show_reasoning summary renders on the gateway. Discord defaults to "subtext" (-# small grey metadata text); every other platform keeps the fenced code block. Resolves through the existing display.platforms.<platform>.reasoning_style override chain.
Authorship-first upstream sync. 11 real conflicts resolved (see PR body). KEEP OURS (fork evolution features preserved): - scripts/release.py AUTHOR_MAP, tools/lazy_deps.py telemetry.otel (#167): union-merged with upstream additions. - hermes_cli/tools_config.py: kept #165 security review trail, took upstream's safer use_shell + env=_cua_driver_env() form. - prompt_builder.py / system_prompt.py / tqmemory_setup.py: auto-merged clean, verified #485 guidance + TQMEMORY_PROJECT_ROOT intact. - conversation_loop.py loop-guard #432/#436 + cron digest hooks intact. TAKE UPSTREAM (upstream-owned infra / supersets): - .github/workflows/{tests,supply-chain-audit}.yml: workflow_call orchestration via new ci.yml + detect-changes action (replaces fork's inline #108 changes job). - .github/workflows/docker-publish.yml: upstream PR-build structure + fork's newer dependabot action pins. - build-windows-installer.yml: accepted upstream deletion (#c820eb6a5). - tools/terminal_tool.py: upstream superset (fork watcher routing + #10760 guard). - agent_runtime_helpers.py memory mirror: upstream notify_memory_tool_write (consolidates fork on_memory_write; adds success-gating). - conversation_loop.py: upstream #39550 token-aware compression retry. FOLLOW UPSTREAM REMOVAL — google-gemini-cli provider (#50492): Upstream removed this provider across 13+ files in non-conflicting ways (auto-merged) + deleted agent/google_oauth.py. Keeping the fork's provider would require surgical re-integration across 13+ files (out of scope for a sync PR, perpetual future conflicts). Followed upstream's clean removal: reverted the 2 keep-ours hunks (run_agent.py, agent_runtime_helpers.py) and dropped the fork's adapter/test/plan. Tree is now consistent; gemini-cli tests no longer exist. ** Owner: if the gemini-cli provider must stay, re-add it as a dedicated feature branch, NOT in this sync. ** NOT auto-merged into main — escalated to owner review per big-sync policy (PR #405 precedent).
Contributor
|
You are seeing this message because GitHub Code Scanning has recently been set up for this repository, or this pull request contains the workflow file for the Code Scanning tool. What Enabling Code Scanning Means:
For more information about GitHub Code Scanning, check out the documentation. |
Contributor
🔎 Lint report:
|
| Rule | Count |
|---|---|
unresolved-import |
57 |
unresolved-attribute |
41 |
invalid-argument-type |
19 |
invalid-assignment |
14 |
unsupported-operator |
10 |
not-subscriptable |
9 |
invalid-method-override |
2 |
not-iterable |
2 |
invalid-return-type |
2 |
call-non-callable |
1 |
unresolved-reference |
1 |
invalid-parameter-default |
1 |
no-matching-overload |
1 |
First entries
tests/tools/test_approval_interrupt.py:131: [invalid-assignment] invalid-assignment: Object of type `() -> dict[str, int]` is not assignable to attribute `_get_approval_config` of type `def _get_approval_config() -> dict[Unknown, Unknown]`
tests/gateway/test_whatsapp_bridge_pidfile.py:179: [unresolved-attribute] unresolved-attribute: Attribute `readline` is not defined on `None` in union `IO[Any] | None`
plugins/memory/mem0/_backend.py:163: [unresolved-import] unresolved-import: Cannot resolve imported module `psycopg2`
tests/gateway/test_approval_prompt_redaction.py:114: [unresolved-attribute] unresolved-attribute: Object of type `AST` has no attribute `lineno`
tests/gateway/test_whatsapp_bridge_pidfile.py:85: [unsupported-operator] unsupported-operator: Operator `+` is not supported between objects of type `int | None` and `Literal[1]`
tools/process_registry.py:541: [invalid-argument-type] invalid-argument-type: Argument to constructor `float.__new__` is incorrect: Expected `str | Buffer | SupportsFloat | SupportsIndex`, found `Unknown | int | str | ... omitted 16 union elements`
tests/honcho_plugin/test_oauth_flow.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/hermes_cli/test_web_server_boot_handshake.py:29: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/hermes_cli/test_update_zip_atomic_replace.py:17: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/gateway/test_tui_approval_redaction.py:14: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/process_registry.py:540: [invalid-argument-type] invalid-argument-type: Method `__getitem__` of type `bound method str.__getitem__(key: SupportsIndex | slice[SupportsIndex | None, SupportsIndex | None, SupportsIndex | None], /) -> str` cannot be called with key of type `Literal["daemon_term_grace_seconds"]` on object of type `str`
tests/gateway/test_whatsapp_to_jid.py:10: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
hermes_cli/tools_config.py:3360: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `dict[Unknown, Unknown]` and `str | list[dict[str, str | list[Unknown]] | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 5 union elements`
tools/computer_use/cua_backend.py:390: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `str`, found `Any | None | Literal[""]`
plugins/memory/mem0/_setup.py:855: [invalid-assignment] invalid-assignment: Invalid subscript assignment with key of type `Literal["_mode_from_flag"]` and value of type `Literal[False]` on object of type `dict[str, str]`
tests/hermes_cli/test_kanban_lifecycle_hooks.py:131: [unresolved-attribute] unresolved-attribute: Attribute `status` is not defined on `None` in union `Task | None`
tests/hermes_cli/test_goals.py:1339: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["run pytest"]` and `str | None`
plugins/memory/honcho/oauth_flow.py:270: [invalid-method-override] invalid-method-override: Invalid override of method `log_message`: Definition is incompatible with `BaseHTTPRequestHandler.log_message`
tests/hermes_cli/test_web_server_boot_handshake.py:95: [unresolved-import] unresolved-import: Cannot resolve imported module `anyio`
tools/url_safety.py:359: [unsupported-operator] unsupported-operator: Operator `in` is not supported between objects of type `Literal["%"]` and `str | int`
tests/honcho_plugin/test_oauth_flow.py:337: [unresolved-import] unresolved-import: Cannot resolve imported module `fastapi`
tests/hermes_cli/test_goals.py:1078: [invalid-assignment] invalid-assignment: Object of type `int | float` is not assignable to attribute `waiting_until` on type `GoalState | None`
tests/gateway/test_whatsapp_bridge_pidfile.py:20: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tools/browser_tool.py:1354: [unresolved-import] unresolved-import: Cannot resolve imported module `psutil`
tests/plugins/memory/test_mem0_setup.py:6: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
... and 135 more
✅ Fixed issues (28):
| Rule | Count |
|---|---|
unresolved-attribute |
13 |
unresolved-import |
7 |
invalid-argument-type |
4 |
invalid-assignment |
3 |
unsupported-operator |
1 |
First entries
agent/google_oauth.py:782: [unresolved-attribute] unresolved-attribute: Attribute `set` is not defined on `None` in union `Event | None`
tests/skills/test_google_oauth_setup.py:332: [invalid-argument-type] invalid-argument-type: Argument to function `module_from_spec` is incorrect: Expected `ModuleSpec`, found `ModuleSpec | None`
agent/google_oauth.py:187: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `<module 'fcntl'>`
tests/skills/test_google_oauth_setup.py:365: [unresolved-import] unresolved-import: Cannot resolve imported module `googleapiclient`
tests/hermes_cli/test_tools_config.py:735: [invalid-argument-type] invalid-argument-type: Argument to function `_detect_active_provider_index` is incorrect: Expected `list[Unknown]`, found `str | list[dict[str, str | list[Unknown]] | dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | list[dict[str, str | list[Unknown] | bool | list[str]] | dict[str, str | list[dict[str, str]]]] | ... omitted 4 union elements`
tests/skills/test_google_oauth_setup.py:108: [unresolved-attribute] unresolved-attribute: Unresolved attribute `Flow` on type `ModuleType`
hermes_cli/inventory.py:178: [invalid-assignment] invalid-assignment: Object of type `None` is not assignable to `def is_aggregator(provider: str) -> bool`
agent/google_oauth.py:235: [unresolved-attribute] unresolved-attribute: Module `msvcrt` has no member `locking`
tests/agent/test_gemini_cloudcode.py:334: [unresolved-attribute] unresolved-attribute: Attribute `refresh_token` is not defined on `None` in union `GoogleCredentials | None`
agent/google_oauth.py:235: [unresolved-attribute] unresolved-attribute: Module `msvcrt` has no member `LK_UNLCK`
agent/google_oauth.py:907: [invalid-assignment] invalid-assignment: Object of type `() -> Literal[True]` is not assignable to `def _can_open_graphical_browser() -> bool`
hermes_cli/cli_commands_mixin.py:994: [unresolved-attribute] unresolved-attribute: Object of type `Self@_handle_gquota_command` has no attribute `_console_print`
tests/agent/test_gemini_cloudcode.py:289: [invalid-argument-type] invalid-argument-type: Argument is incorrect: Expected `str`, found `Unknown | str | int`
tests/skills/test_google_oauth_setup.py:334: [unresolved-attribute] unresolved-attribute: Attribute `loader` is not defined on `None` in union `ModuleSpec | None`
tests/agent/test_gemini_cloudcode.py:246: [unresolved-attribute] unresolved-attribute: Attribute `project_id` is not defined on `None` in union `GoogleCredentials | None`
tests/skills/test_google_oauth_setup.py:13: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
agent/background_review.py:527: [invalid-argument-type] invalid-argument-type: Argument to `AIAgent.__init__` is incorrect: Expected `str`, found `(Any & ~AlwaysFalsy & ~Literal["codex_app_server"]) | None | Literal["codex_responses"]`
agent/gemini_cloudcode_adapter.py:523: [unresolved-attribute] unresolved-attribute: Attribute `get` is not defined on `None` in union `Any | None | dict[str, Any]`
plugins/memory/mem0/__init__.py:175: [unresolved-import] unresolved-import: Cannot resolve imported module `mem0`
tests/plugins/memory/test_mem0_v2.py:198: [unresolved-attribute] unresolved-attribute: Attribute `join` is not defined on `None` in union `None | Thread`
tests/agent/test_gemini_cloudcode.py:247: [unresolved-attribute] unresolved-attribute: Attribute `managed_project_id` is not defined on `None` in union `GoogleCredentials | None`
tests/skills/test_google_oauth_setup.py:366: [unresolved-import] unresolved-import: Cannot resolve imported module `google_auth_oauthlib`
tests/plugins/memory/test_mem0_v2.py:10: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/agent/test_gemini_cloudcode.py:22: [unresolved-import] unresolved-import: Cannot resolve imported module `pytest`
tests/skills/test_google_oauth_setup.py:109: [unresolved-attribute] unresolved-attribute: Unresolved attribute `flow` on type `ModuleType`
... and 3 more
Unchanged: 5870 pre-existing issues carried over.
Diagnostics are surfaced as warnings — this check never fails the build.
…ibution Two CI regressions on the sync/upstream-2026-06-23 PR, both resolved minimally: 1. tests/gateway/test_13121_shutdown_inflight_transcript_flush.py (`assert 7 == 5`): This is upstream's #13121 test, brought in by the merge. It sets up the flush dedup state via upstream's `_flushed_db_message_ids` (set of id()) API, but the fork's `_flush_messages_to_session_db` uses the fork's `_flushed_db_messages` (list of pinned objects, the #160 id-recycling fix) which the merge correctly kept. The set attribute was ignored → already-flushed prior_history was re-written → 7 rows instead of 5. Fix: adapt the upstream test's declarative dedup setup to the fork's canonical `_flushed_db_messages` API (pin the actual prior_history dicts). Flush production logic is byte-identical to origin/main — no flush regression, only the imported test spoke the wrong API. 2. contributor-check / check-attribution: the 286-commit merge range includes one upstream contributor whose email is a bare `<id>@users.noreply.github.com` (no `+username`, so the workflow's auto-skip regex misses it) and was not in AUTHOR_MAP. Added `wnuuee1` mapping — the exact fix the check prescribes. Verified: test_13121 6/6 pass (CPython); contributor-check reproduction 0 unmapped. NOTE: the "test (6)" shard failure shares root cause with #1 (same gateway test; slice assignment is duration-cache-dependent). No other CPython-level regression found — fork flush + compaction code unchanged by the merge.
#365/#368) My earlier merge resolution for tools/terminal_tool.py was WRONG: I labeled upstream's version a "superset" and took it wholesale (2746 lines), silently dropping the fork's entire terminal streak-classification mechanism (#365/#368, related to loop-guard #432) — the classifier module survived but its integration in terminal_tool.py was lost. `tests/tools/test_terminal_failure_classifier.py` (26 tests) failed with AttributeError: no attribute '_reset_terminal_streak'. Fix: redid the resolution as a proper git 3-way merge of just this file (merge-base ↔ origin/main ↔ upstream/main). Only ONE region conflicted — the watcher-routing block — which correctly resolves to upstream (#10760 async_delivery_supported guard is a true superset of the fork's routing there). Everything else auto-reapplied, restoring: - _increment_terminal_streak / _reset_terminal_streak / get_terminal_streak - `from tools.terminal_failure_classifier import ... streak_recommendation` - streak_recommendation() usage + terminal_streak telemetry in the failure paths …while keeping upstream's #10760 watcher-routing guard AND the fork watcher routing metadata. Result: 3149 lines (origin 3123 + upstream guard). Verified: test_terminal_failure_classifier 26/26, test_terminal_tool 25/25, test_13121 6/6 (CPython). Structural audit (top-level defs origin→HEAD across 201 fork-modified files) now shows only ONE remaining delta: _check_cua_driver_asset_for_arch — which upstream DELETED on purpose (#5f1d23cfb "delete broken pre-install asset probe") with a regression test asserting its absence; left removed as intended (not a loss).
Lexus2016
added a commit
that referenced
this pull request
Jun 24, 2026
…odel-resolution fix (#486) Restore the evolution cached-digest pre-flight as a tracked feature and fix the root bug that made it fail on prod with 'no model configured for pre-flight ping'. What: - cron/evolution_preflight.py: lightweight non-streaming provider ping plus most-recent on-disk digest fallback for evolution pipeline cron stages (introspection/analysis/implementation/research/funnel/integration). - cron/scheduler.py: integrate the pre-flight between runtime resolution and AIAgent construction. On ping failure, return the latest stale digest (graceful degradation) or raise if none exists. Purely additive — the #487 native model-resolution path is unchanged. Why: - When the configured provider is unreachable (e.g. Kimi timeout storms), evolution sessions burn retries/timeouts and deliver nothing. The pre-flight detects this fast and falls back to the last good digest so the pipeline keeps moving with stale-but-structured input instead of failing silently. ROOT-FIX: - preflight_provider() reads runtime['model'], but resolve_runtime_provider() never populates it — the scheduler resolves the model into a separate local 'model' variable (job.model > HERMES_MODEL > config.yaml model.default) and passes it to AIAgent(model=...) directly. On prod runtime['model'] was empty so the ping always returned 'no model configured' and the cached-digest fallback could never trigger. Fixed by syncing runtime['model'] = model (the already-resolved local) before the ping, without clobbering an ACP-resolved model. Provenance: - This feature previously existed only as untracked local code on the prod host; git stash -u hid it and upstream-merge #487 overwrote scheduler.py with the native version. Now restored into git from that stash, with the root bug fixed. Tested: - tests/cron/test_evolution_preflight.py (27 tests) pass, including a new unit test and a scheduler-level test that reproduces the prod failure (runtime returned with NO model + empty job.model + config.yaml model.default) and asserts runtime['model'] is synced to the resolved default. Verified the new test FAILS without the root-fix (captured model == None) and PASSES with it. - Adjacent suites green: test_scheduler_provider.py (29), scheduler model/runtime/run_job slice (77), test_run_one_job.py + codex paths (8).
Lexus2016
added a commit
that referenced
this pull request
Jun 24, 2026
…odel-resolution fix (#486) (#490) * feat(evolution): cached-digest provider pre-flight for cron stages, model-resolution fix (#486) Restore the evolution cached-digest pre-flight as a tracked feature and fix the root bug that made it fail on prod with 'no model configured for pre-flight ping'. What: - cron/evolution_preflight.py: lightweight non-streaming provider ping plus most-recent on-disk digest fallback for evolution pipeline cron stages (introspection/analysis/implementation/research/funnel/integration). - cron/scheduler.py: integrate the pre-flight between runtime resolution and AIAgent construction. On ping failure, return the latest stale digest (graceful degradation) or raise if none exists. Purely additive — the #487 native model-resolution path is unchanged. Why: - When the configured provider is unreachable (e.g. Kimi timeout storms), evolution sessions burn retries/timeouts and deliver nothing. The pre-flight detects this fast and falls back to the last good digest so the pipeline keeps moving with stale-but-structured input instead of failing silently. ROOT-FIX: - preflight_provider() reads runtime['model'], but resolve_runtime_provider() never populates it — the scheduler resolves the model into a separate local 'model' variable (job.model > HERMES_MODEL > config.yaml model.default) and passes it to AIAgent(model=...) directly. On prod runtime['model'] was empty so the ping always returned 'no model configured' and the cached-digest fallback could never trigger. Fixed by syncing runtime['model'] = model (the already-resolved local) before the ping, without clobbering an ACP-resolved model. Provenance: - This feature previously existed only as untracked local code on the prod host; git stash -u hid it and upstream-merge #487 overwrote scheduler.py with the native version. Now restored into git from that stash, with the root bug fixed. Tested: - tests/cron/test_evolution_preflight.py (27 tests) pass, including a new unit test and a scheduler-level test that reproduces the prod failure (runtime returned with NO model + empty job.model + config.yaml model.default) and asserts runtime['model'] is synced to the resolved default. Verified the new test FAILS without the root-fix (captured model == None) and PASSES with it. - Adjacent suites green: test_scheduler_provider.py (29), scheduler model/runtime/run_job slice (77), test_run_one_job.py + codex paths (8). * test(evolution): skip anthropic pre-flight tests when package absent (#486) CI test shard 5 lacks the optional 'anthropic' package, so test_anthropic_success / test_anthropic_failure failed with ModuleNotFoundError instead of being skipped. Add a per-test pytest.importorskip('anthropic') guard so they SKIP when the optional dependency is missing. Applied per-test (not class-level) because the other TestPreflightProvider cases (openai, missing_api_key, missing_model, acp, root-fix unit) do not touch anthropic and must keep running.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Upstream sync: merge
upstream/main(286 commits) — 2026-06-23Authorship-first merge (
git merge upstream/main, real merge commit, ancestry preserved). NOT auto-merged — escalated to owner review per the big-sync policy (>80 commits / core touch; PR #405 precedent).7cafa44a1(origin/main) +6cc07b6cd(upstream/main).Conflict resolutions
.github/workflows/build-windows-installer.ymlc820eb6a5); origin only had dependabot bumps..github/workflows/docker-publish.ymlif: github.event_name != 'pull_request'build-on-main-only structure (2977e7454) + kept fork's newer dependabot action pins (buildx v4.1.0, build-push v7.2.0, setup-uv v8.2.0, login v4.2.0)..github/workflows/tests.ymlworkflow_callsub-workflow of newci.ymlorchestrator..github/workflows/supply-chain-audit.ymlci.yml'sdetectjob via new.github/actions/detect-changes. Fork's inlinechanges/#108 logic superseded.scripts/release.pytools/lazy_deps.pytelemetry.otel(#167) + upstream'stool.computer_use.tools/terminal_tool.pyasync_delivery_supported()guard (#10760). Fork routing fully preserved + bug fix gained.hermes_cli/tools_config.pyshell=use_shell+env=_cua_driver_env()(Windows uses argv list, no shell); kept fork's #165 security-review comment.run_agent.pyagent/agent_runtime_helpers.pynotify_memory_tool_write(consolidates fork'son_memory_write, adds success-gating); (2) gemini-cli client branch removed — see below.agent/conversation_loop.pygoogle-gemini-cliprovider (#50492)Upstream deleted the
google-gemini-cli/google-antigravityOAuth providers wholesale (#50492). The fork had this as a flagship feature wired across ~23 files. During the merge, git auto-merged away the fork's gemini-cli wiring in 13+ non-conflicting files (auth.py12→0 refs,models.py5→0,providers.py4→0, …) and deletedagent/google_oauth.py— leaving only 2 conflicted hunks where the fork branch survived.Keeping the provider would have required surgical re-integration across 13+ files + undeleting
google_oauth.py— out of scope for a sync PR and a guaranteed source of perpetual future conflicts. Resolution: followed upstream's clean removal — reverted the 2 keep-ours hunks and dropped the fork'sgemini_cloudcode_adapter.py/test_gemini_cloudcode.py/plans/gemini-oauth-provider.md. The merged tree now has zerogoogle-gemini-clireferences and is internally consistent.👉 Owner decision required: if the
google-gemini-cliprovider must stay in the fork, re-add it as a dedicated feature branch, not in this sync.Owner verification items
all-checks-passgate from the newci.yml. The old per-job statuses (test (1..6), etc.) no longer report directly — leaving them required will permanently block PRs. (all-checks-passusesif: always()and treats skipped as success, so the fork's [FIX] Markdown-only PRs are permanently BLOCKED: tests.yml paths-ignore vs required test (1-6) statuses #108 markdown-only-PR concern is handled.)Evolution features verified intact
prompt_builder.py: TQMEMORY_GUIDANCE, DELIBERATE_WORK_GUIDANCE, AGENTS.md @import scanner, fix(memory): harden TQMemory for client installs (stable project_id, index guard, timeout, preload, upgrade) #485 "FOCUSED Markdown roots" guard ✓system_prompt.py: TQMEMORY_GUIDANCE activation ✓tqmemory_setup.py: TQMEMORY_PROJECT_ROOT (5×), timeout 600, reinstall logic ✓package.jsonversion0.16.0(fork branding) untouched ✓skills/evolution/(9),cron/evolution/(9),scripts/evolution_*,register_evolution_cron.py✓setup-hermes.shfix(memory): harden TQMemory for client installs (stable project_id, index guard, timeout, preload, upgrade) #485 HF preload + persistent gh auth blocks ✓Tests
tests/agent/test_system_prompt.py+tests/hermes_cli/test_tqmemory_setup.py: 29 passedtests/agent/test_gemini_fast_fallback.py(+ above): 36 passedbash -n setup-hermes.shOKauth,models,providers,provider_catalog,runtime_provider): all import cleanly (no dangling refs)ci.yml).🤖 Prepared by automated sync agent. Independent second opinions (Gemini via consilium) consulted on the CI-orchestration and gemini-cli removal decisions.